34 research outputs found
Integrative Analysis Frameworks for Improved Peptide and Protein Identifications from Tandem Mass Spectrometry Data.
Tandem mass spectrometry (MS/MS) followed by database search is the method of choice for high throughput protein identification in modern proteomic studies. Database searching methods employ spectral matching algorithms and statistical models to identify and quantify proteins in a sample. The major focus of these statistical methods is to assign probability scores to the identifications to distinguish between high confidence, reliable identifications that may be accepted (typically corresponding to a false discovery rate, FDR, of 1% or 5%) and lower confidence, spurious identifications that are rejected. These identification probabilities are determined, in general, considering only evidence from the MS/MS data. However, considering the wealth of external (orthogonal) data available for most biological systems, integrating such orthogonal information into proteomics analysis pipelines can be a promising approach to improve the sensitivity of these analysis pipelines and rescue true positive identifications that were rejected for want of sufficient evidence supporting their presence.
In this dissertation, approaches based on naive bayes rescoring, search space restriction, and a hybrid approach that combines both are described for integrating orthogonal information in proteomic analysis pipelines. These methods have been applied for integrating transcript abundance data from RNA-seq and identification frequency data from the Global Proteome Machine database, GPMDB (one of the largest repositories of proteomic experiment results), into analysis pipelines, improving the number of peptide and protein identifications from MS/MS data. Further, estimation of false discovery rates in very large proteomic datasets was also investigated. In very large datasets, usually resulting from integrating data from multiple experiments, some assumptions used in typical target-decoy based FDR estimation in smaller datasets no longer hold true, resulting in artificially inflated error rates. Alternative approaches that would allow accurate FDR estimation in these large scale datasets have been described and benchmarked.PHDBioinformaticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/116717/1/avinashs_1.pd
Highlights from the ISCB Student Council Symposium 2013
This report summarizes the scientific content and activities of the annual symposium organized by the Student Council of the International Society for Computational Biology (ISCB), held in conjunction with the Intelligent Systems for Molecular Biology (ISMB) / European Conference on Computational Biology (ECCB) conference in Berlin, Germany, on July 19, 2013
Effective Leveraging of Targeted Search Spaces for Improving Peptide Identification in Tandem Mass Spectrometry Based Proteomics
In shotgun proteomics,
peptides are typically identified using
database searching, which involves scoring acquired tandem mass spectra
against peptides derived from standard protein sequence databases
such as Uniprot, Refseq, or Ensembl. In this strategy, the sensitivity
of peptide identification is known to be affected by the size of the
search space. Therefore, creating a targeted sequence database containing
only peptides likely to be present in the analyzed sample can be a
useful technique for improving the sensitivity of peptide identification.
In this study, we describe how targeted peptide databases can be created
based on the frequency of identification in the global proteome machine
database (GPMDB), the largest publicly available repository of peptide
and protein identification data. We demonstrate that targeted peptide
databases can be easily integrated into existing proteome analysis
workflows and describe a computational strategy for minimizing any
loss of peptide identifications arising from potential search space
incompleteness in the targeted search spaces. We demonstrate the performance
of our workflow using several data sets of varying size and sample
complexity
Utility of RNA-seq and GPMDB Protein Observation Frequency for Improving the Sensitivity of Protein Identification by Tandem MS
Tandem mass spectrometry (MS/MS)
followed by database search is
the method of choice for protein identification in proteomic studies.
Database searching methods employ spectral matching algorithms and
statistical models to identify and quantify proteins in a sample.
In general, these methods do not utilize any information other than
spectral data for protein identification. However, considering the
wealth of external data available for many biological systems, analysis
methods can incorporate such information to improve the sensitivity
of protein identification. In this study, we present a method to utilize
Global Proteome Machine Database identification frequencies and RNA-seq
transcript abundances to adjust the confidence scores of protein identifications.
The method described is particularly useful for samples with low-to-moderate
proteome coverage (i.e., <2000–3000 proteins), where we
observe up to an 8% improvement in the number of proteins identified
at a 1% false discovery rate
Highlights from the 1st ISCB Latin American Student Council Symposium 2014
This report summarizes the scientific content and activities of the first edition of the Latin American Symposium organized by the Student Council of the International Society for Computational Biology (ISCB), held in conjunction with the Third Latin American conference from the International Society for Computational Biology (ISCB-LA 2014) in Belo Horizonte, Brazil, on October 27, 2014.Fil: Parra, Rodrigo Gonzalo. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales; ArgentinaFil: Simonetti, Franco Lucio. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. Instituto de Investigaciones Bioquímicas de Buenos Aires. Fundación Instituto Leloir. Instituto de Investigaciones Bioquímicas de Buenos Aires; ArgentinaFil: Hasenahuer, Marcia Anahí. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - La Plata. Instituto Multidisciplinario de Biología Celular. Grupo Vinculado al IMBICE - Grupo de Biología Estructural y Biotecnología-Universidad Nacional de Quilmes - GBEyB | Provincia de Buenos Aires. Gobernación. Comisión de Investigaciones Científicas. Instituto Multidisciplinario de Biología Celular. Grupo Vinculado al IMBICE - Grupo de Biología Estructural y Biotecnología-Universidad Nacional de Quilmes - GBEyB | Universidad Nacional de la Plata. Instituto Multidisciplinario de Biología Celular. Grupo Vinculado al IMBICE - Grupo de Biología Estructural y Biotecnología-Universidad Nacional de Quilmes - GBEyB; ArgentinaFil: Olguin-Orellana, Gabriel J.. Pontificia Universidad Católica de Chile; ChileFil: Shanmugam, Avinash K .. University of Michigan; Estados Unido